Bayesian neural network approaches to ovarian cancer identification from high-resolution mass spectrometry data

نویسندگان

  • Jiangsheng Yu
  • Xue-wen Chen
چکیده

MOTIVATION The classification of high-dimensional data is always a challenge to statistical machine learning. We propose a novel method named shallow feature selection that assigns each feature a probability of being selected based on the structure of training data itself. Independent of particular classifiers, the high dimension of biodata can be fleetly reduced to an applicable case for consequential processing. Moreover, to improve both efficiency and performance of classification, these prior probabilities are further used to specify the distributions of top-level hyperparameters in hierarchical models of Bayesian neural network (BNN), as well as the parameters in Gaussian process models. RESULTS Three BNN approaches were derived and then applied to identify ovarian cancer from NCI's high-resolution mass spectrometry data, which yielded an excellent performance in 1000 independent k-fold cross validations (k = 2,...,10). For instance, indices of average sensitivity and specificity of 98.56 and 98.42%, respectively, were achieved in the 2-fold cross validations. Furthermore, only one control and one cancer were misclassified in the leave-one-out cross validation. Some other popular classifiers were also tested for comparison. AVAILABILITY The programs implemented in MatLab, R and Neal's fbm.2004-11-10.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Probabilistic Contaminant Source Identification in Water Distribution Infrastructure Systems

Large water distribution systems can be highly vulnerable to penetration of contaminant factors caused by different means including deliberate contamination injections. As contaminants quickly spread into a water distribution network, rapid characterization of the pollution source has a high measure of importance for early warning assessment and disaster management. In this paper, a methodology...

متن کامل

A New Hybrid Feature Subset Selection Algorithm for the Analysis of Ovarian Cancer Data Using Laser Mass Spectrum

Introduction: Amajor problem in the treatment of cancer is the lack of an appropriate method for the early diagnosis of the disease. The chemical reaction within an organ may be reflected in the form of proteomic patterns in the serum, sputum, or urine. Laser mass spectrometry is a valuable tool for extracting the proteomic patterns from biological samples. A major challenge in extracting such ...

متن کامل

A novel, high-throughput workflow for discovery and identification of serum carrier protein-bound peptide biomarker candidates in ovarian cancer samples.

BACKGROUND Most cases of ovarian cancer are detected at later stages when the 5-year survival is approximately 15%, but 5-year survival approaches 90% when the cancer is detected early (stage I). To use mass spectrometry (MS) of serum proteins for early detection, a seamless workflow is needed that provides an opportunity for rapid profiling along with direct identification of the underpinning ...

متن کامل

Bayesian Mass Spectra Peak Alignment from Mass Charge Ratios

Proteomics studies based on mass spectrometry (MS) are gaining popular applications in biomedical research for protein identification/quantification and biomarker discovery, especially for potential early diagnosis and prognosis of severe disease before the occurrence of symptoms. However, MS data collected using current technologies are very noisy and appropriate data preprocessing is critical...

متن کامل

An improved structure models to explain retention behavior of atmospheric nanoparticles

The quantitative structure-retention relationship (QSRR) of nanoparticles in roadside atmosphere against the comprehensive two-dimensional gas chromatography which was coupled to high-resolution time-of-flight mass spectrometry was studied. The genetic algorithm (GA) was employed to select the variables that resulted in the best-fitted models. After the variables were selected, the linear multi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Bioinformatics

دوره 21 Suppl 1  شماره 

صفحات  -

تاریخ انتشار 2005